An Exploration of Softmax Alternatives Belonging to the Spherical Loss Family
نویسندگان
چکیده
In a multi-class classification problem, it is standard to model the output of a neural network as a categorical distribution conditioned on the inputs. The output must therefore be positive and sum to one, which is traditionally enforced by a softmax. This probabilistic mapping allows to use the maximum likelihood principle, which leads to the well-known log-softmax loss. However the choice of the softmax function seems somehow arbitrary as there are many other possible normalizing functions. It is thus unclear why the log-softmax loss would perform better than other loss alternatives. In particular Vincent et al. (2015) recently introduced a class of loss functions, called the spherical family, for which there exists an efficient algorithm to compute the updates of the output weights irrespective of the output size. In this paper, we explore several loss functions from this family as possible alternatives to the traditional log-softmax. In particular, we focus our investigation on spherical bounds of the log-softmax loss and on two spherical log-likelihood losses, namely the log-Spherical Softmax suggested by Vincent et al. (2015) and the log-Taylor Softmax that we introduce. Although these alternatives do not yield as good results as the log-softmax loss on two language modeling tasks, they surprisingly outperform it in our experiments on MNIST and CIFAR10, suggesting that they might be relevant in a broad range of applications.
منابع مشابه
The Z-loss: a shift and scale invariant classification loss belonging to the Spherical Family
Despite being the standard loss function to train multi-class neural networks, the log-softmax has two potential limitations. First, it involves computations that scale linearly with the number of output classes, which can restrict the size of problems that we are able to tackle with current hardware. Second, it remains unclear how close it matches the task loss such as the top-k error rate or ...
متن کاملExact gradient updates in time independent of output size for the spherical loss family
An important class of problems involves training deep neural networks with sparse prediction targets of very high dimension D. These occur naturally in e.g. neural language models or the learning of word-embeddings, often posed as predicting the probability of next words among a vocabulary of sizeD (e.g. 200 000). Computing the equally large, but typically non-sparse D-dimensional output vector...
متن کاملValue-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax
This paper proposes “Value-Difference Based Exploration combined with Softmax action selection” (VDBE-Softmax) as an adaptive exploration/exploitation policy for temporal-difference learning. The advantage of the proposed approach is that exploration actions are only selected in situations when the knowledge about the environment is uncertain, which is indicated by fluctuating values during lea...
متن کاملExploration of Tree-based Hierarchical Softmax for Recurrent Language Models
Recently, variants of neural networks for computational linguistics have been proposed and successfully applied to neural language modeling and neural machine translation. These neural models can leverage knowledge from massive corpora but they are extremely slow as they predict candidate words from a large vocabulary during training and inference. As an alternative to gradient approximation an...
متن کاملA novel fuzzy multi-criteria decision-making methodology based upon the spherical fuzzy sets with a real case study
The choice of roll stabilization system is critical for many types of ships. For warships where operational activities are fast and the concept of time is very effective, determining the most appropriate of these systems is of particular importance. Some operations, such as the landing of the helicopter on board, are critical for naval ships. Unwanted rolling motion makes this difficult. In add...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1511.05042 شماره
صفحات -
تاریخ انتشار 2015